Coding Code: Investigating Student’s Data Science Skills with Qualitative Methods

Today’s layout


Investigating student learning through code

What research has been done?

A great deal of research has focused on what to teach in data science courses, but little focus on how students learn data science concepts.


Thus far we have detailed…

  • concepts or competencies that ought to be included in data science programs

  • perspectives on when to teach data science

  • how to teach data science concepts

  • methods for integrating data science into the classroom

  • assorted topics to be considered in data science courses

Drawing on research in Computer Science Education

The Importance of Students’ Attention to Program State (Lewis 2012)


  • Attends to both the code produced by a student and their learning process

  • Pairs a student’s code with their debugging behavior side-by-side


These analyses of students’ code should not be few and far between. Students’ code poses a unique avenue for qualitative research in the teaching and learning of computing.

A framework for analyzing student’s code (Schulte 2008)

Text Surface Program Execution Function
Macrostructure Understanding the overall structure of the program Understanding the “algorithm” of the program Understanding the goal / purpose of the program (in its context)
Relations References between blocks, e.g., method calls, object creation Sequence of method calls, object sequence diagrams Understanding how sub-goals are related to goals, how function is achieved by subfunctions
Blocks Regions of interest (ROI) that syntactically or semantically build a unit Operation of a block, a method, or a ROI (as a sequence of statements) Function of a block, may be seen as a sub-goal
Atoms Language elements Operation of a statement Function of a statement, only understandable in context

How could this look?


Atoms

with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))


Block

anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
summary(anterior)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  
abline(anterior)  
plot(anterior)

Relationships Between Blocks


anterior <- lm(ProximateAnalysisData$PSUA~ProximateAnalysisData$Lipid)  
summary(anterior)  
with(ProximateAnalysisData, plot(PSUA~Lipid, las=1))  
abline(anterior)  
plot(anterior)


posterior2 <- lm(ProximateAnalysisDataOutlier$PSUP ~ ProximateAnalysisDataOutlier$Lipid)
summary(posterior2)
with(ProximateAnalysisDataOutlier, plot(PSUP~Lipid, las=1, xlab = "Whole-body Lipid Content (%)", ylab = "UP Fatmeter Reading"))
abline(posterior2)
plot(posterior2)
posterior2

How can this be used for learning trajectory research?

Descriptive coding


RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 1]


“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using $ operator”

Uncovering emergent themes

linearAnterior <- lm(PADataNoOutlier$Lipid ~ PADataNoOutlier$PSUA)

early <- subset(RPMA2Growth, StockYear < 2006)  

Weight5 <- mean(RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 5], na.rm = TRUE)

gas <- gas[!(substr(gas$sampleID,3,3) %in% c("b","c")), ]   

obsD <- subset(gas, gas$carboy == "D")$N15_N2_Ar

lowerCIBound <- pMat[1:mlleIndex,1][which.min(abs(mlleCI+likelihoods[1:mlleIndex]))]

Data wrangling

Statements of code whose purpose is to prepare a dataset for analysis and / or visualization

Sub-themes

  • selecting variables
  • filtering observations
  • mutating variables

An alternative direction


Process coding:

uses gerunds (“-ing” words) to connote action in the data (Saldana 2013)


  • Particularly relevant to describing the processes of human actions
  • Can be intertwined with time, such that actions can emerge, change, or occur in particular sequences.

Practical considerations

How much code should I collect?

  • Driven by the research question!
    • Amount of each student’s code
    • Number of students

How do readers trust my analysis?

  • Trust comes from:

    • confirmability
    • reliability
    • credibility
    • transferability


Excellent resources: Creswell & Poth (2018); Merriam & Tisdell (2016); Miles et al. (2020)

Why is this important for data science education?

Theobold et al. (2023)


How can we distinguish merely interesting learning from effective learning (Wiggins and McTighe 2005)?

Questions?

References

Corbin, Joseph, and Allan Strauss. 2008. Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks: Sage.
Creswell, J. W., and C. N. Poth. 2018. Qualitative Inquiry & Research Design. Thousand Oaks, CA: Sage.
Lewis, Colleen M. 2012. “The Importance of Students’ Attention to Program State.” Proceedings of the Ninth Annual International Conference on International Computing Education Research, September. https://doi.org/10.1145/2361276.2361301.
Merriam, S. B., and E. J. Tisdell. 2016. Qualitative Research. San Francisco, CA: John Wiley & Sons.
Miles, M. B., A. M. Huberman, and J. Saldaña. 2020. Qualitative Data Analysis. Thousand Oaks, CA: Sage.
Saldana, J. 2013. The Coding Maual for Qualitative Researchers. Thousand Oaks: Sage.
Schulte, Carsten. 2008. “Block Model.” Proceedings of the Fourth International Workshop on Computing Education Research, September. https://doi.org/10.1145/1404520.1404535.
Theobold, Allison S., Megan M. Wickstrom, and Stacey A. Hancock. 2023. Coding Code: Qualitative Methods for Investigating Data Science Skills.”
Wiggins, G., and J. McTighe. 2005. Understanding by Design. 2nd ed. Alexandria: Association for Supervision; Curriculum Development (ASCD).